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Abstract. We study the following model of hidden Markov chain: Yi = Xi + £», 
i = 1, . . . , n + 1 with (Xi) a real- valued positive recurrent and stationary Markov 
chain and (£i)i<-j< n +i a noise independent of the sequence pQ) having a known 
distribution. We present an adaptive estimator of the transition density based on 
the quotient of a deconvolution estimator of the density of Xi and an estimator 
of the density of (Xi,Xi + i). These estimators are obtained by contrast minimiza- 
tion and model selection. We evaluate the L 2 risk and its rate of convergence 
for ordinary smooth and supersmooth noise with regard to ordinary smooth and 
supersmooth chains. Some examples are also detailed. 

Keywords. Hidden Markov chain ; Transition density ; Nonparametric estima- 
tion ; Deconvolution ; Model selection ; Rate of convergence 

1. Introduction 
Let us consider the following model: 
(1) Y i =X i + e i i = 1, . . . , n + 1 

where pQ)j>i is an irreducible and positive recurrent Markov chain and (£i)i>i is 
a noise independent of pQ)j>i. We assume that Ei,...,e n are independent and 
identically distributed random variables with known distribution. This model be- 
longs to the class of hidden Markov models. Contrary to the literature on the 
subject, we are interested in a nonparametric approach of the estimation of the 
hidden chain transition. The problem of estimating the density of Xi from the ob- 
servations Yi, . . . ,Y n when the Xi are i.i.d. (known as the c onvo l ution model) has 
been e xtensively studied, see e.g. ICarroll and Hall (Il988l ). H (|l99ll ). IStefanskil 



(jl990h . lPenskv and Vidakovid jl99flh . IComte et al.1 (l2006bl ). 

But very few author s study the case where (Xi) is a Markov chain . We can cite 
Dorea and Zhaol f)2002h who estimate the density of Yi in such a model. iMasryi f| 19931 ) 
who is in terested in t he es timation of the multivariate density in a mixing frame- 
work and lClemenconl (120031 ) who estimates the stationary density and the transition 
density of the hidden chain. More precisely he introduces an estimator of the tran- 
sition density based on thresholding of a wavelet-vaguelette decomposition and he 

l 
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studies its performance in the case of an ordinary smooth noise (i.e. whose Fourier 
transform has polynomial decay). Here we are interested also in the estimation of 
th e transition densit y of (Xi) but we consider a larger class of noise distributions. 
In lClemenconl fl2003h there is no study of supersmooth noise (i.e. with exponentially 
decreasing Fourier transform), as the Gaussian distribution. However the study of 
such noise allows to find interesting rates of convergence, in particular when the 
chain density is also supersmooth. In the present paper, the four cases (ordinary 
smooth or supersmooth noise with ordinary smooth or supersmooth chain) are con- 
sidered. 

The aim of this paper is to estimate the transition density n of the Markov chain 
(Xi) from the observations Y\, . . . ,Y n . To do this we assume that the regime is 
stationary and we note that n = F/ f where F is the density of (Xi, X i+ i) and / the 
stationary density. The estimation of / comes down to a problem of deconvolution, 
as does the estimati on of F. We u se co ntrast minimization and a model selection 
method inspired by Barron et al. ( 19991 ) to find adaptive estimators of / and F. 
Our estimator of n is then the quotient of the two previous estimators. Note that 
it is worth finding an adaptive estimator, i.e. an estimator whose risk automati- 
cally achieves the minimax rates, because the regularity of the densities / and F is 
generally very hard to compute, even if the chain can be fully described (case of a 
diffusion or an autoregressive process). 

We study the performance of our estima tor by comp u ting t he rate of convergence 
of the L 2 risk. We improve the result of Clemencon ( 20031 ) (case of an ordinary 
smooth noise) since we obtain the minimax rate without logarithmic loss. Moreover 
we observe noteworthy rates of convergence in the case where both noise and the 
chain are supersmooth. 

The paper is organized as follows. Section 2 is devoted to notations and assump- 
tions while the estimation procedure is developed in Section 3. After describing the 
projection spaces to which the estimators belong, we define separately the estimator 
of the stationary density /, the one of the joined density F and last the estimator 
II of the transition density. Section 4 states the results obtained for our estimators. 
To illustrate the theorems, some examples are provided in Section 5 as the AR(1) 
model, the Cox-Ingersoll-Ross process or the stochastic volatility model. The proofs 
are to be found in Section 6. 



2. Notations and Assumptions 

For the sake of clarity, we use lowercase letters for the dimension 1 and capital 
letters for the dimension 2. For a function t : R i— > R, we denote by the L 2 norm 
that is ||t|| 2 = J R t 2 (x)dx. The Fourier transform t* oft is defined by 

t*( u ) = [ e~ ixu t(x)dx 
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Notice that the function t is the inverse Fourier transform of t* and can be writ- 
ten t(x) = l/(27r) J e txu t*(u)du. Finally, the convolution product is defined by 
(t*s)(x) = Jt(x -y)s(y)dy. 
In the same way, for a function T : R 2 i— > R, ||T|| 2 = ff R2 T 2 (x, y)dxdy and 

T*(u,v) = JJ e- ixu - iyv T(x,y)dxdy, (T*S){x,y) = JJ T(x-z,y-w)S(z,w)dzdw. 

We denote by t (g> s the function: (x, y) i— > (t <8> s)(a;, = t(x)s(y). 

The density of is named g and is considered as known. We denote by p the 
density of Yj. We have p = f * q and then p* = f*q*. Similarly if P is the density 
of (Y i} Y i+l ), then P = F * (q <g> q) and P*(u,u) = 

Now the assumptions on the model are the following: 
Al: The function q* never vanishes. 

A2: There exist s>0,6>0, 7>0ifs = and k , k\ > such that 
k {x 2 + l)- 7/2 exp(-6|xn < \q*(x)\ < k x [x 2 + l)- 7/2 exp(-6|x| fl ) 
A3: The chain is stationary with (unknown) density /. 

A4: The chain is geometrically /3-mixing (f3 q < Me~ eq ), or arithmetically (3- 
mixing (J3 q < Mq~ e ) with 9 > 6. 

This last condition is verified as soon as the chain is uniformly ergodic. In the sequel 
we consider the following smoothness spaces: 



A 5 , r ,a(l) = {/ density on R and J \f*(x)\ 2 (x 2 + l) d exp(2a\x\ r )dx < 1} 

with r > 0, a > 0, 5 > 1/2 if r = 0, I > and 
&a,r,a(L) = {F density on R 2 and 

J I \F*(x,y)\ 2 (x 2 + l) A (y 2 + 1) A exp(2A(\x\ R + \y\ R ))dxdy < L} 

with R > 0, A > 0, A > 1/2 if R = 0, L > 0. 

When r > (respectively R > 0) the function / (resp. F) is known as super- 
smooth, and as ordinary smooth otherwise. In the same way, the noise distribution 
is called ordinary smooth if s — and supersmooth otherwise. The spaces of or- 
dinary smooth functions correspond to classic Sobolev classes, while supersmooth 
functions are infinitely differentiable. It includes for example normal (r = 2) and 
Cauchy (r = 1) densities. 

3. Estimation procedure 

Since IT = F/ f we proceed in 3 steps to estimate the transition density IT. First 
we find an estimator f of f (see Subsection 13.21 ). Then we estimate F by F (see 
Subsection 13.31) . And last we estimate II with the quotient F/f (Subsection 13.41) . 

All estimators defined here are projection estimators. We therefore start with 
describing the projection spaces. 
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3.1. Projection spaces. Let 

<p(x) = sin(7rx)/(7ra;) 

and, for m in W, j in Z, <p m ,j( x ) — ^/mip(mx — j). Notice that {<Pm,j}jez is an 
orthonormal basis of the space of integrable functions having a Fourier transform 
with compact support included into [— irm, nm]. In the sequel, we use the following 
notations: 

S m = Span{<^ mj } je z; § m = Span{y2 mj - ® (p m ,k}j,kez 
These spaces have particular properties, which are a consequence of the first point 
of Lemma [3] (see Section [6?7l ): 

(2) VteS m \\t\loo < Vm\\t\\] VT G § m HTlloo < m||T|| 

where = sup xgR \t(x)\ and WT]]^ = sup {Xjy)m 2 \T(x,y)\. 

3.2. Estimation of /. Here we estimate /, which is the density of the Xj's. It 
is the classic deconvolution problem. We choose to estimate / by minimizing a 
contrast. The classical contrast in density estimation is 1/n 53^ =1 [||^|| 2 — 2£(Xj)]. It 
is not possible to use this contrast here since we do not observe Xi, . . . ,X n . Only 
the noisy data Y\, . . . , Y n are available. That is why we use the following lemma. 

Lemma 1. For all function t, let v t be the inverse Fourier transform oft*/q*(—.), 
i.e. 

«•(«) 



„(,) = - J e-^-^du. 

Then, for all 1 < k < n, 

(1) E[v t (Y k )\X 1 ,...,X n ]=t(X k ) 

(2) E[v t (Y k )\ = E[t(X k )\ 

The second assertion in Lemma [T] is an obvious consequence of the first one and 
leads us to consider the following contrast: 

7»(f) = - X>ll 2 - 2 < Y ^ with = 4rK 

We can observe that E 7n (t) = l/n^ =1 [\\t\\ 2 - 2E[v t (Y i )}\ = l/n£? =1 [||t|| 2 - 
2E[t(Xi)]] = \\t\\ 2 - 2 J tf = \\t- /|| 2 - ll/H 2 and then minimizing j n (t) comes 
down to minimizing the distance between t and /. So we define 

(3) f m = arg min7 n (t) 

tes 

or, equivalently, 

1 - 

fm = Yl h Wrnj with % = - ^ V^ m j (K t ) . 

iez i=i 

Actually we should define f m = X^iji<Ar n Qjfmj because we can estimate only a 
finite number of coefficients. If K n is suitably chosen, it does not change the rate 



of convergence since the additional terms can be made negligible. For the sake 
of simplicity , we let the sum over Z. For an example of detailed truncation see 



Comte et al. (£0061 



E 



Conditionally to (X,) , the variance or stochastic error is 

y.v 2 



f m \\ 2 \X 1 ,...,X n ]<J2 Var[- v<? m A Y i)\ X i> ■••:*»]< 



n * — ' n 

i=l 



since Yi, . . . , Y n are independent conditionally to (Xi). Then, it follows from Lemma 
[3] (see Subsection 16.71) that || J2j v ^ mj \\oo = A(m) where 

(4) A(m) = — / \q*(u)\- 2 du. 

This implies that the order of the variance is A(m)/n. That is why we introduce 

f A(m) 
M n = < m > 1, — ^ < 1 
[ n 

To complete the estimation, we choose the best estimator among the collection 

{fm)meM„- Let 

m = argmin{7„(/ m ) + pen(m)} 

m&Mn 

where pen is a penalty term to be specified later (see Theorem [p. Finally we define 

/ frh- 

3.3. Estimation of the density F of (Xi,X i+1 ). We proceed similarly to the 
estimation of /. To define the contrast to minimize, we use the following lemma: 

Lemma 2. For all function T, let Vr be the inverse Fourier transform of'T*/(q* (g) 
<?*)(-•), '■<■ 

W*,V) = A II eiXU+iVV /^W \ dudv. 
^ J J q*(—u)q*(—v) 

Then, for all 1 < k < n, 

(1) E[V T (Y k ,Y k+1 )\X u ...,X n }=T(X k ,X k+1 ) 

(2) E[V T (Y k ,Y k+1 )\ =E[T(X k ,X k+1 )] 

For any function T in L 2 (IR 2 ), we define the contrast 



r n (T) = -£[imi 2 -2VHX,x m )] 



n 

1=1 



whose expectation is equal to ||T|| 2 — 2/n ^^ =1 E[T(Xfc, X^+i)] = ||T — F|| 2 — ||F|| 2 . 
We can now define 

(5) F m = arg min T n (T) 

Tes m 



c 



By differentiating T n , we obtain 
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Fm(x,y) = J2A jik (p md (x)<p m , k (y) with A jik = - ^T^.^JY;, Y i+ i). 



n 

j,k i=l 



We choose again not to truncate the estimator for the sake of simplicity. We have 
defined a collection of estimators {-F m } m( =M n - Note that Vt® s (x,y) = v t (x)v s (y) so 
that here the variance is of order A 2 (m)/n, so we introduce 



A 2 (m) 
m > 1, — — < 1 

n 



To define an adaptive estimator we have to select the best model m. So let 

M = argmin{r n (F m ) + Pen(m)} 



where Pen is a penalty function which is specified in Theorem [3 Finally we consider 
the estimator F = F^. 

3.4. Estimation of II. Whereas the estimation of / and F is valid on the whole 
real line R or M 2 , we estimate II on a compact set B 2 only, because we need a lower 
bound on the stationary density. More precisely, we need to set some additional 
assumptions: 

A5: There exists a positive real fo such that Vx G B, f(x)>fo 
A6: Vx G B, MyeB, U(x, y) < ||n|| B>0O < oo 

Now we set 

(6) n(»,») = (^ if l^>l^>>l> 

I otherwise. 

Here the truncation allows to avoid the too small values of / in the quotient. Now 
we evaluate upper bounds for the risk of our estimators. 

4. Results 

Our first theorem reg ards the prob l em of d econvolution. This resul t may be put 



togethe r with results of IComte et al.l (j2006bl ) in the i.i.d. case and of IComte et al 



(j2006al ) in various mixing frameworks. 



Theorem 1. Under Assumptions A1-A4, consider the estimator f = fm where for 
each m, f m is defined by ([3|) and m = argmin{7„(/ m ) + pen(m)} with 

m&Mn 

. , (7rm)[ s -( 1 - s )+/ 2 ]+A(m) 

pen(m) = k 

n 



where k is a constant depending only on k , ki, b, 7, s. Then there exists C > such 
that 

E||/-/|| 2 <4 inf {||/ m _/f + pen(ro)} + -. 

m£M„ n 

The penalty is close to the variance order. It implies that the obtained rates of 
convergence are minimax in most cases. More precisely, the rates are given in the 
following corollary where \x[ denotes the ceiling function, i.e. the smallest integer 
larger than or equal to x. 

Corollary 1. Under Assumptions of Theorem^ if f belongs to As, r ,a(l), then 

• Ifr = 0ands = E||/ - f\\ 2 < CrT^+^+i 

• //r = ands>0 E\\f - f\\ 2 < C(\nn)' 2S ' s 

• Ifr>0ands = Ell/ - /II 2 < '- 

n 

• lfr>0 and s > 

— if r < s and k = \(s/r — l) -1 ] — 1 , there exist reals bi such that 



,nf +1)r/s - i ] 



nf-f\\ 2 <C(\nn)- 2 V s exp[J2h(^ 

i=0 

ifr = s,if£= [25b + (s - 2 7 - 1 - [s - (1 - s) + /2] + )a]/[(a + b)s] 

E||/-/|| 2 < Cn~ a/{a+b \lnn)-Z 
if r > s and k = \{r/s — l) -1 ] — 1. there exist reals di such that 



,(lnn) (1+27 - s+[s ~ (1 ~ s)+/2]+)/r ^ 



E||/-/|| 2 <C^ «p[-^40nn)l 

i=0 



These rates are the s ame as those o btaine d in the case of i.i.d. variables X,; they 
are studied in detail in Comte et al. In the case r > 0, s > 0, we find the 

original rates obtained i n iLacourl (120061 ). proved as being optimal for < r < s in 
Butucea and Tsybakov (120061 ). In the other cases, we can compare the results of 



Theorem [Q to the one obtained with a nonadaptive estimator. There is a loss only 
in the case r > s > 1/3 where a logarithmic term is added. But in this case, the 
rates are faster than any power of logarithm. 

Now let us study the risk for our estimator of the joined density F. 

Theorem 2. Under Assumptions A1-A4, consider the estimator F = F^ where 
for each m, F m is defined by ([5]) and M = argmin{r n (F m ) + Pen(m)} with 

m£M„ 

Pen(m) = K 

n 
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where K is a constant depending only on fc , 6, 7, s. Then there exists C > such 
that 

E\\F - F\\ 2 < 4 inf {\\F m - Fll 2 + Pen(m)} + -. 

mGM„ n 

The bases derived from the sine cardinal function are adapted to the estimation 
on the whole real line. The proof of Theorem [2] actually contains the proof of 
another result (see Proposition [2] in Section [6]): the estimation of a bivariate density 
in a mixing framework on R 2 and not only on a compact set. In this case of the 
absence of noise (e = 0), we obtain the same result with the penalty Pen(m) = 
Ko(52k P2k)m 2 /n. This limit case gives the mixing coefficie nts back in the penalty. 



as it always appears in this kind of estimation (see e.g. iTribouley and Viennet 
( 1l998h ) 



It is then significant that in the presence of noise the penalty contains neither 
mixing term nor unknown quantity. It is entirely computable since it depends only 
on the characteristic function q* of the noise which is known. 

Theorem [2] enables us to give rates of convergence for the estimation of F. 

Corollary 2. Under Assumptions of Theorem^ if F belongs to ^a,r,a(L), then 

,, o 2A 

• IfR = 0ands = E\\F - F\\ 2 < Cn~ 

• IfR = 0ands>0 E\\F - F\\ 2 < C{lnn)~ 2A/s 



, , ^(lnn)^ +2 )/ K 



IfR>0ands = E\\F - F\\ z < C 

If R > and s > 

— if R < s and k = \(s/R — l) -1 ] — 1, there exist reals hi such that 

k 

E||F-F|| 2 < C(\nn)- 2A/s exp[J2U^n) {l+1)R/s ~ i ] 

i=0 

-ifR = sift= [4Ab+(2s-4rf-2-[s-(l-s)+]+)A]/[(A + 2b)s] 
E\\f - f\\ 2 < Cn- A ^ A+2b \\nn)~t 

— if R> s and k = \{R/s — l) -1 ] — 1. there exist reals di such that 



E||F-F|| 2 < exp[-^rf,(ln 



n \(i+l)s/R-i-\ 



n 

i=0 



The rates of convergence look like the one of Corollary [T] with modifi cations due 



to the bivariate nature of F. We can compare this result to the one of IClemencon 



(120031 ) who studies only the case R = and s — 0. He shows that the minimax lower 

2A 

bound in that case is n 2 a+4 7 +2 ? so our procedure is optimal, whereas his estimator 
has a logarithmic loss for the upper bound. We remark that if s > (supersmooth 
noise), the rate is logarithmic for F belonging to a classic ordinary smooth space. 
But if F is also supersmooth, better rates are recovered. 
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Except in the case R = and s = , there is, to our knowledge, no lower bound 
available for this estimation. We can however evaluate the performance of this 
estimator by comparing it with a nonadaptive estimator. If the smoothness of F is 
known, an m depending on R and A which minimizes the risk \\F — F m \\ 2 + A(m) 2 /n 
can be exhibited and then some rates of convergence for this nonadaptive estimator 
are obtained. As soon as s < 1/2 (i.e. [s — (1 — s)+]+ = 0), the penalty is A(m) 2 /n 
and then the adaptive estimator recovers the same rates of convergence as those 
of a nonadaptive estimator if the regularity of F were known. It automatically 
minimizes the risk without prior knowledge on the regularity of F and there is no 
loss in the rates. If s > 1/2 a loss can appear but is not systematic. If R < s, the 
rate of convergence is unchanged since the bias dominates. It is only in the case 
R > s > 1/2 that an additional logarithmic term appears. But in this case the risk 
decreases faster than any logarithmic function so that the loss is negligible. 

We can now state the main result regarding the estimation of the transition density 

n. 

Theorem 3. Under Assumptions A1-A6, consider the estimator II defined in ©. 
We assume that f belongs to As, r ,a(l) with 5 > 1/2 and that we browse only the 
models m e M. n such that 

n 

(7) m > In Inn and mA(m) < — — 

(Inn)'' 

to define f . Then II verifies, for n large enough, 

E||n-n||| < C 1 E||F-F|| 2 + C 2 E||/-/|| 2 + - 
' n 

where \\T\\ 2 B = JJ B2 T 2 (x,y)dxdy. 

Note that, contrary to Theorems Q] and [2J, this result is asymptotic. It states 
that the rate of convergence for IT is no larger than the maximum of the rates of 
/ and F. The restrictions ((Jj) do not modify the conclusion of Theorem Q] and the 
resulting rates of convergence. Thus if / and F have the same regularity, the rates 
of convergence for II are those of F, given in Corollary [2l 

If s = i.e. if Si is ordinary smooth, then the rates of convergence are polynomial 
and even near the parametric rate 1/n if R and r are positive. But the smoother 
the error distribution is, the harder the estimation is. In the case of a supersmooth 
noise, the rates are logarithmic if / or F is ordinary smooth but faster than any 
power of logarithm if the hidden chain has supersmooth densities. The exact rates 
depend on all regularities 7, s, 5, r, A, R and are very tedious to write. That is 
why we prefer to give some detailed examples. 

5. Examples 

5.1. Autoregressive process of order 1. Let us study the case where the Markov 
chain is defined by 

X n+1 = aX n + f3 + rj n+l 
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where the ?7 n 's are i.i.d. centered Gaussian with variance a 2 . This chain is irre- 
ducible, Harris recurrent and geometrically /3-mixing. The stationary distribution 
is Gaussian with mean (3 /{I — ft) and variance a 2 /(l — a 2 ). So 



f*(u) = exp 



—iu 







1 



a 



a- 



2 1 



-u 



and then bias computing gives 5=1/2, r = 2. The function F is the density of 
a Gaussian vector with mean — a), 0/(1 — a)) and variance matrix cr 2 /(l — 

2 s A ft' 



. So 



exp 



-i(u + v) 







1 



a 



(7 



2(1 -ft 2 



-(u 2 + f 2 + 2ftm>) 



and A = 1/2, i? = 2. 

We can compute the rates of convergence for different kinds of noise e. If e has a 
Laplace distribution, q*(u) = 1/(1 + u 2 ) so s = 0, 7 = 2. In this case, Corollary CD 
gives E||/ - /|| 2 < C{\nnfl 2 /n and E||F-F|| 2 < C{lnn) 5 /n. Consequently, 

,(lnn) 5 



ETI 



n||| < c- 



n 



with B an interval [— d, d]. This rate is close to the parametric rate 1/n; it is due to 
the great smoothness of the chain compared with that of error. 

If now e has a normal distribution with variance r 2 , then we compute 



2 

E||n - n||s < Cn~^+^(lnn)"^+^^ 

5.2. Cox-Ingersoll-Ross process. Another example is given by X n = R nT with r 
a fixed sampling interval and R t the so-called Cox-Ingersoll-Ross process defined by 



dR t = (26R t + Ko$)dt + 2a ^/R~ t dW t 



9 < 0,k E {2,3,...}. 



Following IChaleyat-Maurel and Genon-Catalotl ( 120061 ) , we observe that X n is the 
square of the Euclidean norm of a K-dimensional vector whose components are lin- 
ear autoregressive processes of order 1. The stationary distribution is a Gamma 
distribution with parameter k/2 and |#|/cr 2 so that 



1 + iu- 



0\ 



-k/2 



and r = 0, 5 — (k — l)/2. To compute the characteristic function of the joined 
density, we write 



F*(u,v) 



E[e 



-ivX 1 



x\e 



x f(x)dx. 
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Let f3 2 = al(e 2eT — 1)/ '(29). Then, conditionally to X = x, (3 2 X X is a non-central 
chi-square x* 2 (e 2er x / (3 2 , k), so that 



E[e 



-ivXj 



X = x] = (l + 2iv(3 2 ) 



2\-k/2 



exp 



ive 2 ® 1 x 



1 + 2iv(3 2 



This implies 



F*(u,v) 



1 - (1 - e 



4 

20t\ a o 



.cr, 



o 



9 2 



UV + ljJ77(U + V 

\9\ 



-k/2 



and R = 0, A = («— 1)/2. Then, if for example the noise has a Gaussian distribution 
(7 = 0, s — 2), the rate of convergence is (lnra)( 1_K ^ 2 . But this rate is faster if e has 
a Gamma distribution with shape parameter a (so that 7 = a, s = 0): we obtain 
in this case 



5.3. Stochastic volatility model. Our work allows to study some multiplicative 



model s as the so-called stochastic volatility model in finance (see lGenon-Catalot et al 



((20001) for the links between the standard continuous-time SV models and the hidden 
Markov models). Let 

where (U n ) is a nonnegative Markov chain, (r] n ) a sequence of i.i.d. standard Gauss- 
ian variables, the two sequences being independent. Setting X n = ln(U n ) and 
e n = lia(rj 2 ) leads us back to our initial problem. 

The noise distribution is the logari thm of a chi-s q uare d istribution and then ver- 



ifies q*(x) = 2~ i: T(l/2 - ix)/ypK. IVan Es et all (12003 ) show that |g*(x)| ~ +oc 
^e-wlxl/2 and then s = 1)7 = o 

We assume that the logarithm of the hidden chain X n derives from a regular 
sampling of an Ornstein-Uhlenbeck process, i.e. X n = V nT where V t is defined by 
the equation 

dV t = 9V t dt + adB t 

with B t a standard Brownian motion. Then all the assumptions are satisfied. Sim- 
ilarly to Subsection 15.11 the stationary distribution is Gaussian with mean and 
variance a 2 /2\ 9 \ and then <5 = l/2,r = 2. In the same way F is the density of a cen- 

R = 2. We obtain the following rate of convergence on some interval B = [—d, d] 



jn-nf R <Cv^ exp[(7r//3)v ^ : 

with (3 2 = a 2 (e 2dT -1)1(29). 



E|, , nB 

n 
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6. Proofs 



Here we do not prove the results concerning the estimation of /. In deed they are 



similar to the ones concerning F (but actually simpler) and the ones of lComte et al 



(I2006bl ). It is then sufficient to use corresponding proofs for F mutatis mutandis. 

For the sake of simplicity, all constants in the following are denoted by C, even if 
they have different values. 

6.1. Proof of Lemma [2l It is sufficient to prove the first assertion. First we write 
that V T {Y k ,Y k+1 ) = l/A-n 2 j e iYkU+iY ^ v T*(u,v)/q*(-u)q*(-v)dudv so that 



E 



1 f T*(v v) 

[V T {Y k ,Y k+1 )\X u ...,X n ] = — / E[j Y ^ iY ^\X u ...,X n ] \\ dudv. 

An 2 J q*(—u)q*(—v) 

By using the independence between (JQ) and (£f), we compute 

E[e iYkU+iYk+lV \X 1} ..,X n ] = E[e iXkU+iXk+lV e i£kU+i£k+lV \X 1} ..,X n ] 

= e iX ^+iX k+1 v E ^ e ie k u^ e ie k+1 v^ = e ix k u +l x k+1 v J jn^fa J e iyv q(y)dy 



e iX k u+iX k+1 v q *(_ u j q * 



Then 



E[y T (y fc ,y fe+1 )|x 1; .,x n ] = -L / e iXkU+iXk ^q*(-u) q *(-v) J* { ^) . dudv 

An 1 J q*{—u)q*{—v) 
= i J e iX * u+iX ^ v T*{u,v)dudv = T(X k ,X k+1 ). 

6.2. Proof of Theorem [2J. First we introduce some auxiliary variables whose ex- 
istence is ensured by Assumption A4 of mixing. In the case of arithmetical mixing, 
since 6 > 6, there exists a real c such that < c < 1/2 and c9 > 3. We set in this 
case q n = ^[n c \. In the case of geometrical mixing, we set q n = ||_ cm ( n )J where c 
is a real larger than 3/9. 

For the sake of simplicity, we suppose that n = Ap n q n , with p n an integer. Let for 
i = 1, . . .,n/2, Vi = (X 2i _ u X 2i ) and fo r / = 0, . y n - l, A t = (V 2lqn+1 , V {2 i+i) qn ), 
Bi = {V( 2 i +1 ) qn+1 , ...,V( 2 i +2 ) qn ). As in |Viennet| (jl997|), by using Berbee's coupling 
Lemma, we can build a sequence (Af) such that 

Ai and A\ have the same distribution, 
Al and A*, are independent if I ^ I', 
P{At^ Al) < (3 2gn . 

In the same way, we build (B*) and we define for any / G {0, . . . ,p n — 1}, 

A *i = ( V 2i qn +i,-,V*2i+i)J, B *i = ( V *2i+i) qn +v ■■■, V *2i + 2)J 80 that the sequence 
(V*, . . . , V*, 2 ) and then the sequence (Xj*, . . . ,X*) are well defined. We can now 
define 

ft* = {Vi, 1 < i < n X i = X*}. 
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Then we split the risk into two terms: 

E(||P-P|| 2 ) = E(||P-P|| 2 1^) +E(||F-F|| 2 1^ C ). 
To pursue the proof, we observe that for all T, T" 

r n (T) - r n (T') = \\T - F\\ 2 - \\T' - P|| 2 - 2Z n (T - T) 

where Z n {T) = jvH^+i) - J T(x,y)F(x,y)dxdy^ . 

Let us fix m G M n and denote by F m the orthogonal projection of F on S m . Since 
T n (F) + Pen(M) < T n (F m ) + Pen(m), we have 

\\F-F\\ 2 < \\F m - F\\ 2 + 2Z n (F - F m ) + Pen(m) - Pen(M) 

< \\F m -F\\ 2 + 2||P-P m || sup Z n (T) + Pen(m) - Pen(M) 

TeB(M) 

where, for all m', B{m') = {T G S m + S m /, ||T|| = 1}. Then, using inequality 

2xy < x 2 /4 + Ay 2 , 

(8) ||P-P|| 2 < ||P m -P|| 2 + ^||P-P m || 2 + 4 sup Z 2 n (T) + Pen(m) - Pen(M). 

By denoting Ex the expectation conditionally to X\, . . . , X n and by using Lemma El 
Z n {T) can be split into two terms : 

Z n {T) = Z n>1 {T) + Z n>2 (T) 

with 

- n 

Z nA (T) = -J2 {V T (Yi, Y l+1 ) - E x [V T (Y t , Y i+1 )}} , 



n 



y)dxdy > . 



Z ^ T ) = \zZ ( T ( x » X ^ - [ T ^ y) F ^ 
n i=i ^ j 

Now let Pi(., .) be a function such that for all m, m', 

(9) 16Pi (to, to') < Pen(m) + Pen(m'). 

Then ([HI) becomes 

\\F - F\\ 2 < \\F m - F\\ 2 + ±(\\F - F\\ 2 + \\F - F rn \\ 2 ) + 2Pen(m) 

+8[ sup Zl 1 (T)-P 1 (m,M)] + 8[ sup Z 2 2 (T) - P x (to, M)] 

TeB(M) TeB(M) 

which gives, by introducing a function P2(., ■), 

^||P-P|| 2 1^ <||P m -P|| 2 + 2Pen(m) + 8 V [ sup Z 2 n X {T) - P 1 (m, m')] + 
+8 ^ [ sup ; Z 2 j2 (T)-P 2 (m,m')] + ln.+8 ^ [P 2 (m,m') - Pi(m,m')]- 

m'£M„ T&B(m') m ' e M„ 
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We now use the following propositions: 

Proposition 1. Let Pi(m,m') = C(q)(7rm"Y s ~^ 1 ~ s ^ + ^+A 2 (m")/n where A(m) is de- 
fined in ([4]) andm" = max(m, m') and C(q) is a constant. Then, under assumptions 
of Theorem^ there exists a positive constant C such that 



(10) 



E E 



sup Z 2 l (T) — Pi(m, m!) 

TeS(m') 



< 



C 



n 



Proposition 2. Let P2(m,m') = 96(Y^ fc flzk)™" J n where A(m) «s defined in ((H) and 
m" = max(m, m'). Then, under assumptions of Theorem^ there exists a positive 
constant C such that 



in 



E E 



m'eP 



sup Z 2 2 (T) — p2(m, m') 

TeS(m') 



1< 



< 



c 



The definition of the functions Pi(m,m') and P 2 (m, m') given in Propositions Q] 
and [2] imply that there exists m such that VW > m Pi(m,m') > P 2 (m,m'). (If 
s = = 7 (case of a null noise), it would be wrong and the penalty would then be 
P 2 (m,m') instead of Pi (m,m')). Then 

C(mo) 
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[P2(m, m') — Pi (771,771')] < P2(m,m) < 

m'sMn m'<mo 



n 



Since m" A 2 (m") < mA 2 (m) + m'A 2 (m'), the condition ([9]) is verified with 

[-<!-•)+]+ 



Pen(m) = 16C(g)(7rm) 



And finally, combining (1T2I) and Propositions [T] and [21 



n 



c 



n 



E(\\F - F\\ 2 t n *) < 4(||P m - P|| 2 + Pen(TTi)) + 
For the term E(||P - F|| 2 ln«), recall that 

1 - 

F m (x,y) = 22Aj, k (p m j{x)(p mtk {y) with A j>k = -2_jV Vmtj9Vmtk 

j,k i=l 



{Yi,Yi 



i+lj 



Thus, for any 771 in 



E 



\y n * - n 

i=l J j,k i=l 



(13) 



< 



En 



<Pm,j®<Pm,h 1 100 



<iiE^Ji^< A V) 



3,k 



using Lemma El Then ||Pjy-|| 2 < A 2 (M) < n since M belongs to M n . And 
E||P - P|| 2 l n - < E(2(||P|| 2 + ||F|| 2 )1^) < 2(n + ||P|| 2 )P(fi* c ). 
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Using Assumption A4 in the geometric case, (3 2qn < Me _6,cln(n ) < Mn~ 9c and, in the 
other case, (3 2qn < M(2q n )~ e < Mn~ 9c . Then P(Q* C ) < 2p n (3 2qn < nMn~ cd . Since 
c6 > 3, P(Q* C ) < Mn- 2 , which implies E(\\F - F|| 2 1^ C ) < C/n. 
Finally we obtain 

E||F — F|| 2 < E(||F-F|| 2 1 Q ») + E{\\F - Fft n *c) 

< 4(||F m -F|| 2 + Pen(m)) + -. 

n 

This inequality holds for each m G M n , so the result is proved. 

6.3. Proof of Proposition [H We start by isolating odd terms from even terms to 
avoid overlaps: 

Z njl (T) = \z° ntl {T) + \z^T) 

with 

2 n 

Z°nAT) = - J2 {VT{Yi,Y i+l ) -^[Vt^Y^)]} , 
n z — ' 

i=l,i odd 

2 n 

KAT) = - J2 {VT{Y h Y i+x )-^ x [V T {Y h Y i+1 )]}. 

Tl . 

1=1, even 

It is sufficient to deal with the first term only, as the second one is similar. For each 
i, let Ui = {Y M -. 1 ,Y 2i ), then 

n/2 

Z°,x(T) = -j- £ {VriUi) - E X [V T ([/,)]} . 

' 8=1 

Notice that conditionally to X\, . . . , X n , the C/j's are independent. Thus we can use 
the Talagrand inequality recalled in Lemma El Note that if T belongs to S m + S m ' , 
then T can be written T\ + T 2 where T* has its support in [— 7rm, 7rm] 2 and has 
its support in [— nm', nm'} 2 . Then T belongs to S m » where m" is defined by 

(14) m" = max(m,m'). 

Now let us compute Mi, H and u of the Talagrand's inequality. 
(1) If T belongs to B(m'), 

V T (x,y) = ^2a jk V Vmllj ^ mllk (x,y) = J2a jk v Vm „ ] (x)v Vm „Jy). 

j,k j,k 

Thus \V T (x,y)\ 2 <Z ik \ (l/)| 2 . So 



sup H^rll^ < || \°<P m »j( x ) v <P M n, M\ 2 W™ ^ II Yl 

TeB(m') ^ Y 



■j | 2 || 2 



By using Lemma 02, Mi = A(m") 



16 



(2) To compute H 2 , we write 



E x ( sup {Z^fiT)) < ExC^Z^m^j®^*) 

TeB(m>) jk 



< ^Var x 



n 



E ^ m » J -( y <)^ m »*( i <+i) 



i=l,i odd 



^ E^ E Va ^K m -,(^)^, fc (^ + i)] 



i=l,i odd 



since, conditionally to X ± , . . . ,X n , the C/j's are independent. And then 



E 



*( sup < E ^ E Mi/Kjy 



n 

jr',fc i=l,i odd 



< - 

n z 



E ii E i^m",j iuiiE 



V m »jl II oo || \ v f m ll >k \ II oo 

i=l,i odd j k 



2|| < 2A(m- 



//\2 



So we set H = y/2A(m")/y/n. 
(3) We still have to find v. On the one hand 



Vnr x [V T (Y k ,Y k+1 )} < E x [C£a jk v Vmll .(Y k )v Vmllk (Y k+1 )) 2 } 



3,k 



- E°i fc ii E i^m",j 2 iiooii E 

j,k j k 



J Vm.",k\ H°° 



and so f > A(m") . On the other hand 



VM X [V T (Y k ,Y k+1 )] 

- E E a ^ fc i a ^ fc 2 E xK m ,,, Ji ^ m ,,, J2 (n)^ m ,, ifei ^ m „ fc2 (n+i)] 



y ji.fci .72, fa 



( 15 ) <E a k/E E 



Xl V <P m ".H V <P- 



m " ,3\ ^m" ,32 



"„ 2 (^)]E E i 

fci,fc 2 



(n+i)], 
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(16) 



using conditional independence. Now we use Lemma [3] to compute 



31 r m,J2' 



m 

471-2 




77 e ~ijiv e i(x+X k )vm" rn — ij 2 u e i(x+X k )um" 
; ; d,V 



q*[—vm 



q*(—um") 



duq(x)dx 



m 



II r-iv PIT ^—ij\v—ij2U^iX k (u+v)ra" 



z ix{u+v)m " q{x)dxdudv. 



47r 2 J_ 7T J_ 7T q*(—vm")q*(—um") 
If we set W(u,v) = m"e iXk( - u+v ^ m " q*(-(u + v)m")/[q*(-vm")q*(-um")\, then 



^x[ v ip m n h v <p m ii n (Xk)] i s the Fourier coefficient with order (ji,j2) of W. Us- 
ing Parseval's formula 

r»7T /»7T 

-2 .. ,\ - ,1 ._ _J_ / / l vi-, „>|2. 

Ji Ja 




v)\ 2 dudv 



— TV J —TV 



m 



II 2 



47T 2 




g*(— (it + u)77l") 



g* (—vm" ) g* ( —um" ) 



dudv. 



Now we apply Schwarz inequality: 



JU2 
1/2 

m 



\q*{—{u + f )ra")p 



■dudv * 



\q*(—{u + u)m")| 5 



< 



m 
4^2 



|g*(— um")| 4 
|g*(— Mm")|~ 4 <i-u / |g*(x)| 2 (ia; < 



dudv 



\q*(-vm")\ 4 
'M-;.;„ / I . , .. / Htfll 2 /" 7Fm u*/ 



4tt 2 



\q*(-u)\-*du. 



We introduce the following notation: 

1 



A 2 (m) 



4vr 2 



|g*(«)|- 4 d«. 



Finally, coming back to (EES), Var x [V T (Y k , Y k+ i)} < ||T|| 2 ||g|| 2 A 2 (m") which 
yields v > ||g|| 2 A 2 (m"). Finally we write v = min(||g|| 2 A2(m"), A 2 (m")). 
We can now use Talagrand's inequality (see Lemma [5]): 

2A 2 (m") 

n sup (^ 1 ) 2 (T)-2(l + 2e) ^i] + < 

TeB(m>) n 



— \ve 



-K 1 eA 2 (m")/v 



+ 



-K 2 C(e)y/ey/n 



n ' nC 2 (e) 
And then, if P x (m,m') > 4(1 + 2e)A 2 (m")/n, 



E E 



m'eP 



sup (Z° il ) 2 (T)-P 1 (m,m / ) 

TeB(m') 



if 

< —{I(m) + II(m)} 
n 
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with I(m) = E m 'm n ™- KieA2{m " )/v ; I Km) = E m ^m^C\e))e- K ^^. 
To bound these terms, we use Lemma [4] which yields to 

v < C 3(W) 47+min(1 - s ' 2 - 2s) e 46 ( W ') s and > c 4 {irm")^+ 



where c 3 and c 4 depend only on ho, k±,j and s. Therefore 
J(m) < c 3 ^2 



7rm //\47+min(l-s,2-2s) e 4fe(7rm") s --ft:iC4e(7rm") (1 " s) + 



rn'eM n 

< c 3 [(7rm) 47+min(1 ~ s ' 2 ~ 2s) e 46(7rm)s 

m'eM„ 

+ (' ffm ')47+min(l- S ,2-2s) e 46(W) s n e -^^[(7rm)( 1 - s ' + +(7rm') (1 - fl) + ] 



< C3 ( 7rm ) 4 7+min(l-s,2-2s) e 4fe(7rm) s -%H(7rm)( 1 - s )+ ^ , ~ 

m'eM„ 

-^^(Trm)* 1 ""^ ^ TO /^47+min(l- S ,2-2 S ) e 4b(7r m ') s -^^(7rm')( 1 - s )+ 



^4 7 +min(l-s,2-2s) p 4fe(7rm) s -i^^(7rm)( 1 - s) + ^ <3 _£l|4i(, rm ')(i-s) + 

m'& 

+C 3 e 2 V™J \ ^75-777, J 

m'eM„ 

We have to distinguish three cases 

case s < (1 — s)_|_ -v=> s < 1/2: In this case we choose e = 86/(i^iC 4 ) and then 

^47+l-s^4fc[(7rm) s -(7rm)( 1 - 3 )] ^-ifio^Trm')* 1 - 8 ) 

m'eB 

+C3e -x 1C4( . m) ( 1 -) j2 



I{m) < C3 ( 7rm ) 4 T+l- Se 46[( 7 rm)-( 7 rm)Ci-)] ^ ( 



m'eM„ 

which implies that J(m) is bounded. Moreover the definition of M n and 
Lemmalgive |M n | < Cn< with C > and ( > 0. So II(m) < (C/n)\M n \ e - K '^ 
is bounded too. 
case s = (1 — s) + -v=> s = 1/2: In this case 

J(m) < ^(vrm) 4 ^ 1 /^^-^)^ 172 £ e'^^ 72 

m'6M„ 



m'6l„ 

We choose e such that 46 — Kic^e/2 = —46 so that 

m'6M„ 

+c 3 e- £ ^(™) 1/2 £ (W) 4 ^ 1 /^- 46 ^') 172 < C. 

m'eMn 

The term II(m) is also bounded since e is a constant. 



J(m) < C3 (7rm) 4 ^ +1 / 2 e- 4b ^ 1/2 ^ e" 
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case s > (1 — s) + s > 1/2: Here we choose e such that 
Ab(nm") s - K 1 c 4 e(7rm ,, ) {1 " s)+ /2 = -4&(W) S 

so that 

J(m) < C3 ( 7rm )47+ m m(l- Sl 2-2 S ) e -46( 7 rm)^ ^ ^^(W^-H 

m'eM„ 

+C 3 e _: ^ (7rm)(1 ~ S)+ ( 7rm /)4 7 +min(l- S ,2-2 S ) e -4b(7rm') fl < £, 

m'eM n 

Moreover 



JJ( m ) = _J_ e ~K 2 ^b/I^ i (7rm'')^ 1 -^ 2 ^ < 

m'£M„ ^ 

In any case e = [86/^04] (7rm")^ (1-s)+1+ , so that 

PxKm') = C(q){Tim"f- {l - s ^ A 2 (m")/n 
where C(q) is a constant depending only on k , ki, b, 7, s. 

6.4. Proof of Proposition El We split Z n ^{T) into two terms : 



C. 



Z n>2 (T) = \z° n ^T) + \ZZJT) 



with 



Z °n,2{T) = - E {T(X h X i+1 )- I *T(x,y)F(x,y)dxdy\, 

< t=l,i odd ^ ^ ' 

i=l, i even *■ " ^ 

We bound E ^ [sup rgB(m /)(Z° i2 ) 2 (T) — P2(m, m')] + In* J • The second term can 

be bounded in the same way. We write Z° n2 (T) = (2/ri) {T(V$ - EfT^)]} 
with Vi = (X 2 i-i,X 2i ). In order to use LemmaEJ we introduce 



where 

p n -l (2l+l)q n 

/ s J- 



1 Pn-l 1 (2i+l)q n 

^( T ) = -E- E invn-nnvm 



(=0 i=2<<j n +l 
Pn-l ^ (2l+2)g n 



i,2(t) = - E - E { T (^*) - e to*)]} • 



Pn i=0 qn i=(2l+l)q n +l 



20 



Since X^ = X* on Q* , we can replace Z° 2 by Z°* 2 . This leads us to bound 
E ( [sup TgB ( m /) ^i(T) — P2(m, m')] In* j . So we compute the bounds Mi, if and 



f of Lemma El 

(1) If T belongs to § m », |T(x,y)| 2 < £\ fe a 2 fe £ ijfc fm",j( x )^m",k(y) and so 

imioo^imiii^^iioo^iiriK, 

j 

using (1) of LemmaEl Then \\l/q n £ffi£+i T H°° < and Mi = m" 

(2) Let us compute if 2 . 

sup Z/ 2 i(T) < V] V* x {Vm",j ® Vm",k) 
TeB(m') ' 

Then, by taking the expectation, 

Pn-l j (2Z+l)«? n 

E( sup | < > ^Var(^ - ^ 



TeB(m 



\ 1 Pn-1 , (2Z+l)?n 

} / i,fc Prl i=0 qn i=2lq n +l 

^ Pn-1 ^ (2/+l)g„ 



j,k ^ n 1=0 ^ n i=2lq n +l 

by using independence of the A* t . Lemma [6] then gives 
E( sup v 2 nl {T)\ < ^^|| V(^ m »,j O V^m",fc) 2 ||oo V/3 2fc < —Cy02k)m" 2 

\T<=B(rn>) ' J Vn^n ~^ ' U ' 

Finally if = 4y/^fam»/,/n 
(3) f remains to be calculated. If T belongs to B(m'), using Lemma [6] 



(2i+l) 3n 
^ n i=2lq n + l 

4 „„„ , r „„ , r ,„ ! 



1 (2i+l) 3n 

Var[- £ T(0 < -EpPWbW)] 



< -iiriuvEfT 2 ^)]^ 2 ^)] < -imuvlmuv 2 E( A:+1 )^* 

and so i; = 4||F||^V 2 + W^m"/q n . 
By writing Talagrand's inequality (Lemma [5j) with e = 1, we obtain 

e([ sup (M) 2 (T) - G^lV&lm'^lJ < ^ L"e-^" + 

V TGB(m') n ^ / 71 t 71 J 
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Then by summation over ml 

V E([ sup (zy nil ) 2 (T)--(V/3 2fc )m" 2 ] + l^ ) 

<-{Y m"e- K ^ m " + Y m"n 2c - 1 e- K ^ nl/2 - c } < - 

m'el„ m'£M„ 

since c < 1/2. In the same way, we obtain 

V E([ sup K, 2 ) 2 (T)--(^/3 2fe )m" 2 ] + l Q , ) <-, 



which yields 



J2 e[[ sup (Z° i2 ) 2 (T)-P, 



c 

- 2 ^m,m')] + ln* ) < — 



with P 2 (m, m') = 96(^/3 2 fc)m" /n. 

6.5. Proof of Corollary[2]. Let us compute the bias term. Since = 



7rj?i,7rmr j 



||F-F m || 2 = -L // |F>,t;)| 2 ^ 
4vr ././([_ 7rmj7rm ]2)c 



But 



< — f f \F*(u,v)\ 2 dudv+ [ [ \F*(u,v)\ 2 dudv 



/ / |F*(w,t;)| 2 dw*; < L((vrm) 2 + i)-A e -2A( 7rm )«_ 

J J [-7rm,7rml c x]R 



-7rm,7rm] c xR 

Thus \\F - F m \\ 2 = 0((irm)- 2A e~ 2A ^ R ) and 



E||F-F|| 2 <C" inf ((vrm)- 2A e- 2A(wm) 

m£M„ ^ 



H ( 7rm )[s-(l-^)+] + +47+2-2 S f I 1 

n \ n 



Next the bias- variance trade-off is performed similarly to iLacourl ( 120061 ) . 



6.6. Proof of Theorem H Let E n = {\\f - fW^ < f /2}. On E n and for x G B, 

f(x) = f(x)-f(x) + f(x) > f /2. Since F belongs to using ©, \\F\U < M\\F\\. 
But (EE3D gives ||F|| < A(M) so that \\F\loo < MA(M). Since M belongs to M n , 
A(M) < v/n and Lemma H gives M < A(Mf/^ +1 ^ if s = or M < (In A(M)) 1 / 8 
otherwise. So, for n large enough, (2// )||F|| oo < n and n(x,?/) = F(x,y)/ f(x). 
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For all (x,y) G B 2 , 
\U(x,y)-U(x,y)\ 2 < 



F(x,y)-f(x)U(x,y) 



t En + (\n(x,y)\ + \n(x,y\) 2 l E c 



< 



\F(x,y) - F(x,y) + U(x,y)(f(x) - f(x))\ 2 
/oV4 

+2(1111111 + \U(x,y)\ 2 )l E c. 
Since J B U 2 (x,y)dy < ||n|| B;0O f B U(x, y)dy < ||n|| B)0O for all x G B, 

EP - n||| < ^[E||F - F\\ 2 + ||n|| fl>00 E||/ - /|| 2 ] + 2\B\(\B\n 2 + ||n|| B>00 )p(^). 
Jo 

We still have to prove that P{E°) < Cn~ 3 . Given that ||/ - fW^ < \\f - f^W^ + 
\\fm ~ /m| | oo we obtain 

P(E c n ) < P(\\f - / A |U > /o/4) + P(\\U - / A |U > /o/4). 

Let us prove now that if / belongs to As, r , a (l) with <5 > 1/2, ||/ — / m ||oo = 
Q^ m i/2-s e -a(irL m ) r y gj nce y*^ — /* 1 [_„. mj7rm ] and using the inverse Fourier trans- 
form, 



\\f ~ fmWoo < ^~ f \f*(u)\du 
A7r J\u\>wm 



Let a G (1/4,5/2). By considering that functions i— > (x 2 + l) <5 / 2 "e"' 1 '" is increasing 
and using the Schwarz inequality, we obtain 



11/ - /mlloo < + l)- 5/2+a e- a(7Tm)r V~L [ (u 2 + l)-** dli . 

^ y J\u\>wm 

But Jj M |>^ m (M 2 + l)" 2Q rfM < C(nm) l - ia and then 



||/ _ /J^ < ^((vr m )2 + 1 )-5/2+a e -a( 7 rm)^ 7rm )l/2-2a = Q^l^-a^)^ 
27T 



Thus, since m > In Inn, ||/ — /^ 
/o/4) = 0. Next 



and for n large enough P(||/ — / m ||oo > 

fo 



P(\\U ~ /mlloc > /o/4) < P(^* C ) + P \\U - frhWU* > 



4vm 



Since c9 > 3, P(fi* c ) < Mn^ ce < Mn' 2 . We still have to prove that 
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First, we observe that 



||/m - /mil 2 = £ f i E W y *) - e^mA = sup 



where i/ n (t) = J ££ =1 ~ %tQ-i)], S m = {te S m , \\t\\ < 1}. 

Then 



P[\\fm-hUn* > 



fo 



P sup \v n {t)\t n * > 



fo 



Ayrh 



As previously, we split v n (t) into two terms 

p n -l (2l+l)q n p n -l (2l+2)q n 

^)^Er E ^)-%^)]+^-£- E vy)-Eh(y)] 

^ n 2=0 y ™ i=2/ g „+l ^ n 2=0 y ™ i=(2l+l)q n +l 

and it is sufficient to study 

p n (2l+l)q n 



P ( sup 

teB rh 



-E- E MYn-nvt(Y*)} 



Pn l=Q Qh i=2 lq n+ i 



> 







4vm 



We use inequality ( fT7l ) in proof of Lemma [5] with 77 = 1 and A 



/o 



P ( sup 

teB rn 



Here, we compute 



Pn (2i+l)q„ 



-E- E «*07) - e[«*(i? 

< exp I —Kp n min 



P» 2=0 gn i=22g n +l 



> 2P + A 



A 2 _\_ 

u ' Mi 



Mi = VA(m); # 2 = 8E^ 



Afm) 



7? 



A(m) 



Thus 



P ( sup \is n (t)\ > 2H + ^= 
vtes™ 8\An 



< 2 exp — iT' min 



n 



Pn 



m) 



Now we use the assumption Vm mA(m) < n/(lnra) 2 . For n large enough, 2PT 
4v /2 E fc Av / ^PI/v / ^ < fo/(SV^). So 

P ( sup \uJt)\ > -^=] < 2exp (-if min ((Inn) 2 , n 1/2 - c Inn)) < 
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6.7. Technical Lemmas. 
Lemma 3. For each m e M. n 
(!) II Ej^mjIU = rn 

(2) u Vmj (a:) = \/m/(2ir) f^e^ jv e i * m [f(-vm)]- 1 dv 

(3) IIE-K m J 2 ||oo=A(m) 

where A(m) is defined in (jlj). 

Proof of Lemma d' First we remark that 

= e~ ixu y/mip(mx - j)dx 



1 e - iju/m I ' e- ixu/m V (x)dx = -^— e - iju/m v*(—) 



>m j \Zm m 

Thus using the inverse Fourier transform 

<^ (a:) = _L / e i«*_L e -iWm )dw = — / e- ijv ^/me imm dv. 



The Parseval equality yields X^j^mj^) = V 2 ^ |\A" elOTm | 2 ^ = m - The first 
point is proved. Now we compute v ipmj (x) 

v ( x ) = J_ f e ^ (p * m ^ du = — [ e txu —e- iju/m v*(-) — 



m /" -ijv ixvm <P*(V) i 

e J e — : rdv. 



2n J q*(—vm) 

But (p*(v) = l[_ 7r)7r ](u) and thus the second point is proved. Moreover v Vm ,(x) can 
be seen as a Fourier coefficient. Parseval's formula then gives 

(^)| 2 = — I V^e ixvm —^ r dv = — [ \q*(-vm)r 2 dv. 

fm,A )\ 2n J ^ v q *(- vm ) 2tt J |y V Jl 

Therefore || £\ |^ m J 2 ||oc = 1/2tt f™ m |g*(-«)|- 2 du = A(m). 

Lemma 4. 7/g verifies \q*(x)\ > k (x 2 + 1) _7,/2 exp(— 6|x| s ), i/ien 

(1) A(m) < Cl {Tim)^ +1 - s e 2b ^\ 

(2) A 2 (m) < c 2 (7rm) 4 T +1 - s e 4b ( 7rm ) s . 

Moreover »/ 1 | < h(x 2 + 1)~^ /2 exp(-b\x\ s ), then A(m) > c^Trm) 2 ^ 1 -^ 2 ^ 7 " 7 ^. 

The proof of this result is omitted. It is obtained by distinguishing the cases 
s > 27 + 1 and s < 27 + 1 and with standard evaluations of integrals. 

Lemma 5. LetTi, . . . , T n be independent random variables andv n (r) = (l/n) Y17=i[ r (^i)~ 
E(r(Tj)] ; for r belonging to a countable class 1Z of measurable functions. Then, for 
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e > 0, 

E[sup Mr) | 2 - 2(1 + 2e)H>} + < C ( + -^-e^^^ 



reTl 



n n 2 C 2 (e) 



with K\ = 1/6, K 2 = 1/(21 v2) } C(e) = a/1 + e — 1 C a universal constant and 
where 



sup HrlU < Mt, E fsup |i/ n (r) \] < H, sup - V Vax(r(Ti)) 



< V. 



Usual density arguments allow to use this result with non-countable class of func- 
tions 71. 

Proof of Lemma {R - We apply the Talagrand concentration inequality given in 



Klein and Rid ( 120051 ) to the functions s l (x) = r(x) — E(r(Tj)) and we obtain 

/ nX 2 
P{sup\v n {r)\ >H + X) <exp - 

ren V 2(v + AHMi) + 6M 1 X / 

Then we modify this inequality following iBirge and Massartl (119981 ) Corollary 2 
p. 354. It gives 



71 



X 2 min(?7, 1)A 



(17) P(sup|i/ n (r)| > (1 + 7])H + X) <exp^--min^— , _^ 

To conclude we set r\ = Vl + e — 1 and we use the formula E[X] + = J °° P(X > t)dt 
with X = sup ren \is n (r)\ 2 - 2(1 + 2e)H 2 . 

Lemma 6. ( ViennA \l99i) ) Let {Ti) a strictly stationary process with (3 -mixing 
coefficients j3k- Then there exists a function b such that 

E[6(?i)] <J2fc and nb 2 (T 1 )}<2j2(k + l)Pk 

k k 

and for all function ip (such that E[-^ 2 (Ti)] < oo) and for all N 

N 

Var(^(T,)) < 47VE[^ 2 (Ti)6(T 1 )]. 
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